649 research outputs found

    Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection

    Get PDF
    Pedestrian detection is an important component for safety of autonomous vehicles, as well as for traffic and street surveillance. There are extensive benchmarks on this topic and it has been shown to be a challenging problem when applied on real use-case scenarios. In purely image-based pedestrian detection approaches, the state-of-the-art results have been achieved with convolutional neural networks (CNN) and surprisingly few detection frameworks have been built upon multi-cue approaches. In this work, we develop a new pedestrian detector for autonomous vehicles that exploits LiDAR data, in addition to visual information. In the proposed approach, LiDAR data is utilized to generate region proposals by processing the three dimensional point cloud that it provides. These candidate regions are then further processed by a state-of-the-art CNN classifier that we have fine-tuned for pedestrian detection. We have extensively evaluated the proposed detection process on the KITTI dataset. The experimental results show that the proposed LiDAR space clustering approach provides a very efficient way of generating region proposals leading to higher recall rates and fewer misses for pedestrian detection. This indicates that LiDAR data can provide auxiliary information for CNN-based approaches

    Optimality in multiple comparison procedures

    Full text link
    When many (m) null hypotheses are tested with a single dataset, the control of the number of false rejections is often the principal consideration. Two popular controlling rates are the probability of making at least one false discovery (FWER) and the expected fraction of false discoveries among all rejections (FDR). Scaled multiple comparison error rates form a new family that bridges the gap between these two extremes. For example, the Scaled Expected Value (SEV) limits the number of false positives relative to an arbitrary increasing function of the number of rejections, that is, E(FP/s(R)). We discuss the problem of how to choose in practice which procedure to use, with elements of an optimality theory, by considering the number of false rejections FP separately from the number of correct rejections TP. Using this framework we will show how to choose an element in the new family mentioned above.Comment: arXiv admin note: text overlap with arXiv:1112.451

    Integration of Legacy and Heterogeneous Databases

    Get PDF

    Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis

    Full text link
    Cross-domain synthesizing realistic faces to learn deep models has attracted increasing attention for facial expression analysis as it helps to improve the performance of expression recognition accuracy despite having small number of real training images. However, learning from synthetic face images can be problematic due to the distribution discrepancy between low-quality synthetic images and real face images and may not achieve the desired performance when the learned model applies to real world scenarios. To this end, we propose a new attribute guided face image synthesis to perform a translation between multiple image domains using a single model. In addition, we adopt the proposed model to learn from synthetic faces by matching the feature distributions between different domains while preserving each domain's characteristics. We evaluate the effectiveness of the proposed approach on several face datasets on generating realistic face images. We demonstrate that the expression recognition performance can be enhanced by benefiting from our face synthesis model. Moreover, we also conduct experiments on a near-infrared dataset containing facial expression videos of drivers to assess the performance using in-the-wild data for driver emotion recognition.Comment: 8 pages, 8 figures, 5 tables, accepted by FG 2019. arXiv admin note: substantial text overlap with arXiv:1905.0028

    Combining Multiple Views for Visual Speech Recognition

    Get PDF
    Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view (30∘30^\circ) to up to 83% when combining this view with the frontal and 60∘60^\circ view angles

    A Variational Model for Object Segmentation Using Boundary Information and Shape Prior Driven by the Mumford-Shah Functional

    Get PDF
    In this paper, we propose a new variational model to segment an object belonging to a given shape space using the active contour method, a geometric shape prior and the Mumford-Shah functional. The core of our model is an energy functional composed by three complementary terms. The first one is based on a shape model which constrains the active contour to get a shape of interest. The second term detects object boundaries from image gradients. And the third term drives globally the shape prior and the active contour towards a homogeneous intensity region. The segmentation of the object of interest is given by the minimum of our energy functional. This minimum is computed with the calculus of variations and the gradient descent method that provide a system of evolution equations solved with the well-known level set method. We also prove the existence of this minimum in the space of functions with bounded variation. Applications of the proposed model are presented on synthetic and medical image

    Learn to synthesize and synthesize to learn

    Get PDF
    Attribute guided face image synthesis aims to manipulate attributes on a face image. Most existing methods for image-to-image translation can either perform a fixed translation between any two image domains using a single attribute or require training data with the attributes of interest for each subject. Therefore, these methods could only train one specific model for each pair of image domains, which limits their ability in dealing with more than two domains. Another disadvantage of these methods is that they often suffer from the common problem of mode collapse that degrades the quality of the generated images. To overcome these shortcomings, we propose attribute guided face image generation method using a single model, which is capable to synthesize multiple photo-realistic face images conditioned on the attributes of interest. In addition, we adopt the proposed model to increase the realism of the simulated face images while preserving the face characteristics. Compared to existing models, synthetic face images generated by our method present a good photorealistic quality on several face datasets. Finally, we demonstrate that generated facial images can be used for synthetic data augmentation, and improve the performance of the classifier used for facial expression recognition.Comment: Accepted to Computer Vision and Image Understanding (CVIU

    Multiscale Active Contours

    Get PDF
    We propose a new multiscale image segmentation model, based on the active contour/snake model and the Polyakov action. The concept of scale, general issue in physics and signal processing, is introduced in the active contour model, which is a well-known image segmentation model that consists of evolving a contour in images toward the boundaries of objects. The Polyakov action, introduced in image processing by Sochen-Kimmel-Malladi in Sochen et al. (1998), provides an efficient mathematical framework to define a multiscale segmentation model because it generalizes the concept of harmonic maps embedded in higher-dimensional Riemannian manifolds such as multiscale images. Our multiscale segmentation model, unlike classical multiscale segmentations which work scale by scale to speed up the segmentation process, uses all scales simultaneously, i.e. the whole scale space, to introduce the geometry of multiscale images in the segmentation process. The extracted multiscale structures will be useful to efficiently improve the robustness and the performance of standard shape analysis techniques such as shape recognition and shape registration. Another advantage of our method is to use not only the Gaussian scale space but also many other multiscale spaces such as the Perona-Malik scale space, the curvature scale space or the Beltrami scale space. Finally, this multiscale segmentation technique is coupled with a multiscale edge detecting function based on the gradient vector flow model, which is able to extract convex and concave object boundaries independent of the initial condition. We apply our multiscale segmentation model on a synthetic image and a medical imag
    • …
    corecore